Draft

Too many things:

  • read in a RMF (in the quest for understanding how to compare model to data)
  • play around with "typing" arrays based on their units
  • can we have a frames back end for FITS table files (perhaps ocnvert ARF to CSV to get some more experience with frames?)?

Any of these require actual parsing of the header data...

So, back to the ARF file from the previous notebook.

I should note that up-to-date information on the FITS standard is available!

Unrelated ideas (although they do involve parsing, mainly XML responses from a web server):

  • given a name, call a name resolver (e.g. CADC) to get a location; then Chandra footprint server (which has images). Now, name server isn't actually needed.

  • explain @astroprop code


In [1]:
import qualified Data.ByteString.Char8 as B8

cts <- B8.readFile "../data/src.arf"

chunk :: Int
chunk = 2880

splitBS :: B8.ByteString -> Int -> Int -> B8.ByteString
splitBS inBS s n = B8.take n (B8.drop s inBS)

Cheat since we know how long the header is...


In [2]:
hdrBS = splitBS cts chunk (6*chunk)

In [3]:
B8.take 80 hdrBS
B8.drop (6*chunk-80) hdrBS


"XTENSION= 'BINTABLE'           / binary table extension                         "
"                                                                                "

Re-use some of the code. However, let's tweak the types:

XXX probably drop this, as don't really take it anywhere and have other types that can be used to represent "type safety" (e.g. Keyword).


In [4]:
newtype Card = Card B8.ByteString
  deriving (Eq, Show)

-- | Given a string, split off the next 80 characters.
getCard1 :: B8.ByteString -> (Card, B8.ByteString)
getCard1 bs = let (ls, rs) = B8.splitAt 80 bs
              in (Card ls, rs)

With this, I can not mix up the unparsed data with a card, since they are a different type (previously I used type which is just a synonym).


In [5]:
(a,_) = getCard1 hdrBS
a


Card "XTENSION= 'BINTABLE'           / binary table extension                         "

In [6]:
getCard1 a


Couldn't match expected type ‘B8.ByteString’ with actual type ‘Card’
In the first argument of ‘getCard1’, namely ‘a’
In the expression: getCard1 a

XXX hmmm; this doesn't really show off the type safety. may have to wait until have something more; perhaps the parse-a-header unit. Of course, the Card stuff is likely a digression if the aim is to use a parser combinator library a la attoparsec.

I can continue my "simplification mania":


In [7]:
import Control.Arrow (first)

In [8]:
:type first


first :: forall (a :: * -> * -> *) b c d. Arrow a => a b c -> a (b, d) (c, d)

In [9]:
:type first not (True,"False")


first not (True,"False") :: (Bool, [Char])

In [10]:
first not (True,"False")


(False,"False")

In [11]:
-- | Given a string, split off the next 80 characters. The Arrow version.
getCard :: B8.ByteString -> (Card, B8.ByteString)
getCard = first Card . B8.splitAt 80

In [12]:
fst (getCard hdrBS)


Card "XTENSION= 'BINTABLE'           / binary table extension                         "

XXX Insert a quite hilarious joke about fst and first.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

How about attoparsec => drop the card business?


In [13]:
import Data.Attoparsec.ByteString.Char8 hiding (take)
import Control.Applicative ((<*), (*>), (<|>), (<$>), optional)

Let's start with a keyword: 1 to 8 alphanumeric-ish characters:


In [14]:
-- simplify rules
keywordStart :: Parser Char
keywordStart = satisfy (inClass ['A' .. 'Z'])

In [15]:
parse keywordStart (B8.pack "X")


Done "" 'X'

In [16]:
Done leftover parseval = parse keywordStart (B8.pack "XTENSION= 'blah blah'")

In [17]:
parseval


'X'

In [18]:
leftover


"TENSION= 'blah blah'"

In [19]:
parseOnly keywordStart (B8.pack "Q")


Right 'Q'

In [20]:
parseOnly keywordStart (B8.pack "q")


Left "Failed reading: satisfyWith"

In [21]:
parseOnly keywordStart (B8.pack "QQ")


Right 'Q'

In [22]:
parseOnly (keywordStart <* endOfInput) (B8.pack "QQ")


Left "endOfInput"

In [23]:
keywordBody1 :: Parser Char
keywordBody1 = satisfy (inClass (['A' .. 'Z'] ++ ['0' .. '9'] ++ "_-"))

In [24]:
keywordBody :: Parser Char
keywordBody = keywordStart <|> satisfy (inClass ['0'..'9']) <|> satisfy (\c -> c == '_' || c == '-')

In [25]:
parseOnly keywordBody (B8.pack "6")


Right '6'

In [26]:
keywordUnbounded :: Parser String
keywordUnbounded = do
  fchar <- keywordStart
  body <- many' keywordBody
  return (fchar : body)

In [27]:
parseOnly keywordUnbounded (B8.pack "XTENSION= blah blah")


Right "XTENSION"

In [28]:
parseOnly keywordUnbounded (B8.pack "XTENSIONKEYWORD = blah blah")


Right "XTENSIONKEYWORD"

In [29]:
-- Parse 0 up to n copies of the parser, returning the results.
--
-- It is expected that the counter is 0 or more.
upTo :: Int -> Parser a -> Parser [a]
upTo n p | n < 1     = return []
         | otherwise = do
            mx <- optional p
            case mx of
              Just x -> do
                xs <- upTo (n-1) p
                return (x : xs)
              _ -> return []

In [30]:
:type optional


optional :: forall (f :: * -> *) a. Alternative f => f a -> f (Maybe a)

In [31]:
parseOnly (upTo 3 (char 'X')) (B8.pack "XX")


Right "XX"

In [32]:
parseOnly (upTo 3 (char 'X')) (B8.pack "XXXX X")


Right "XXX"

In [33]:
parseOnly (upTo 3 (char 'X')) (B8.pack "YXXX")


Right ""

In [34]:
-- Is keyword type useful here?
--
-- Ord is needed for the Map
newtype Keyword = Keyword String deriving (Eq, Ord, Show)

-- Note: this parses the first 8 characters (i.e. requires
-- spaces after "short" keywords).
--
keyword :: Parser Keyword
keyword = do
  fchar <- keywordStart
  body <- upTo 7 keywordBody
  let res = fchar : body
      nres = length res
  count (8-nres) space -- ignore the return value
  return (Keyword (fchar : body))

In [35]:
testBS = (B8.take 40 (B8.drop 80 hdrBS)) -- want to show the remainder, and a < 8 char keyword
Done remainder kword = parse keyword testBS

In [36]:
testBS


"BITPIX  =                    8 / 8-bit b"

In [37]:
kword


Keyword "BITPIX"

In [38]:
remainder


"=                    8 / 8-bit b"

What about values? Note: column lengths can be relaxed here (as there's a suggested fixed format but can also be relaxed):

  • 'string' ... where '' is used to represent a single quote character (there's also a minimum length of the string, since the trailing ' must be in column 20 or later, so trailing whitespace is ignored)
  • T ... (or F) - a boolean, which is in column 30
  • n ... - an integer
  • f ... - a floating-point value, which may or may not be in exponential form, but if so uses E.

where the numeric values end in column 30 (I am ignoring complex numbers, and the ... part should be /...).


In [39]:
-- Should this store an Integer rather than Int?
--
data KeyValue = KeyString String | KeyFloat Double | KeyInt Int | KeyBool Bool deriving (Eq, Show)

In the following the skipSpace (before and handling of comments after) are left to the caller


In [40]:
-- | Remove trailing space characters
rstrip :: B8.ByteString -> B8.ByteString
rstrip = go
  where
    go bs = case B8.unsnoc bs of
      Just (rbs,c) | c == ' ' -> go rbs
      _ -> bs

In [41]:
stringVal :: Parser KeyValue
stringVal = do
  let quote = '\''
  char quote
  body <- takeTill (== quote)
  return (KeyString (B8.unpack (rstrip body)))

In [42]:
boolVal :: Parser KeyValue
boolVal = 
  let true  = char 'T' *> return (KeyBool True)
      false = char 'F' *> return (KeyBool False)
  in true <|> false

In [43]:
parseOnly boolVal (B8.pack "T ")
parseOnly boolVal (B8.pack "F")
parseOnly boolVal (B8.pack "'T'")


Right (KeyBool True)
Right (KeyBool False)
Left "Failed reading: satisfyWith"

Use floatingOrInteger from the scientific package -- Nopity nope nope since it takes 1.23E23 as an integer.


In [44]:
intVal1 :: Parser KeyValue
intVal1 = do
  ans <- signed decimal
  return (KeyInt ans)
  
intVal :: Parser KeyValue
intVal = KeyInt <$> signed decimal

In [45]:
parseOnly intVal (B8.pack "123")
parseOnly intVal (B8.pack "123.45") 
parseOnly intVal (B8.pack "-123")


Right (KeyInt 123)
Right (KeyInt 123)
Right (KeyInt (-123))

In [46]:
-- Support E+23 or E-23
exponential :: Parser Int
exponential = char 'E' *> signed decimal

-- this does not support 1E+23, it needs to be X.Y...
floatVal :: Parser KeyValue
floatVal = do
  msign <- optional (char '-' <|> char '+')
  ldigs <- many1' digit
  char '.'
  rdigs <- many1' digit
  mexp <- optional exponential
  let f = read (ldigs ++ '.' : rdigs)
      g = case mexp of
        Just eval | eval > 0 -> f * 10 ^ eval
                  | eval < 0 -> f / 10 ^ (abs eval)
                  | otherwise -> f
        _ -> f
        
      s = case msign of
        Just '-' -> -1
        _ -> 1
        
  return (KeyFloat (s*g))

In [47]:
parseOnly floatVal (B8.pack "1.23E-43")
parseOnly floatVal (B8.pack "-0.0001E3")
parseOnly floatVal (B8.pack "123")


Right (KeyFloat 1.23e-43)
Right (KeyFloat (-0.1))
Left "not enough input"

In [48]:
value :: Parser KeyValue
value = stringVal <|> boolVal <|> floatVal <|> intVal

In [49]:
map (parseOnly value . B8.pack) ["'A string  ' ", "T", "123", "-1.23E+47"]


[Right (KeyString "A string"),Right (KeyBool True),Right (KeyInt 123),Right (KeyFloat (-1.23e47))]

In [50]:
import Data.Maybe (fromMaybe)

-- Handle any description. I could also parse out any unit, but leave that for now.
comment :: Parser String
comment = do
  skipSpace
  mcbs <- optional (char '/' >> skipSpace >> many' anyChar)
  -- TODO: strip trailing white space
  return (fromMaybe "" mcbs)

In [51]:
-- Should the 80-character limitation be tracked by the parser, or can we split into chunks
-- beforehand?
card :: Parser (Keyword, KeyValue)
card = do
  key <- keyword
  -- have to have "= " but then need to also eat up any spaces
  -- before the value.
  char '=' >> many1' space
  val <- value
  _ <- comment
  return (key, val)

In [52]:
splitBS hdrBS 0 80
splitBS hdrBS 80 80
parseOnly card (splitBS hdrBS 0 80)
parseOnly card (splitBS hdrBS 80 80)


"XTENSION= 'BINTABLE'           / binary table extension                         "
"BITPIX  =                    8 / 8-bit bytes                                    "
Right (Keyword "XTENSION",KeyString "BINTABLE")
Right (Keyword "BITPIX",KeyInt 8)

In [53]:
import Data.List (unfoldr)

In [54]:
getCards :: B8.ByteString -> [B8.ByteString]
getCards = unfoldr go
  where
    go bs | B8.null bs = Nothing
          | otherwise  = Just (B8.splitAt 80 bs)
          
hdrCards = getCards hdrBS

In [55]:
parseOnly card (head hdrCards)


Right (Keyword "XTENSION",KeyString "BINTABLE")

In [56]:
map (parseOnly card) (take 5 hdrCards)


[Right (Keyword "XTENSION",KeyString "BINTABLE"),Right (Keyword "BITPIX",KeyInt 8),Right (Keyword "NAXIS",KeyInt 2),Right (Keyword "NAXIS1",KeyInt 12),Right (Keyword "NAXIS2",KeyInt 1070)]

In [57]:
mapM (parseOnly card) (take 5 hdrCards)


Right [(Keyword "XTENSION",KeyString "BINTABLE"),(Keyword "BITPIX",KeyInt 8),(Keyword "NAXIS",KeyInt 2),(Keyword "NAXIS1",KeyInt 12),(Keyword "NAXIS2",KeyInt 1070)]

In [58]:
mapM (parseOnly card) hdrCards


Left "Failed reading: satisfyWith"

In [59]:
import Data.Either (isLeft)

-- find the first card that fails (returns a Left value)
head (filter (isLeft . snd) (zip hdrCards (map (parseOnly card) hdrCards)))


("HISTORY  TOOL  :mkwarf   2014-11-26T05:38:12                            ASC00001",Left "Failed reading: satisfyWith")

So, good types can only take you so far in protecting yourself against errors in your data model.


In [60]:
import Control.Monad (void)

-- Skip blank lines, END lines, and those that are "metadata" lines (e.g. HISTORY/COMMENT)
fixedCard :: Parser (Maybe (Keyword, KeyValue))
fixedCard = 
  (do
    key <- keyword
    mchar <- optional (char '=')
    case mchar of
      Just _ -> do
        many1' space
        val <- value
        void comment
        return (Just (key, val))
      
      _ -> many' anyChar >> return Nothing
  )
  <|>
  (string (B8.pack "END") >> skipSpace >> return Nothing)
  <|>
  (skipSpace >> return Nothing)

In [61]:
head (filter (isLeft . snd) (zip hdrCards (map (parseOnly fixedCard) hdrCards)))


Prelude.head: empty list

In [62]:
import qualified Data.Map as Map
import Data.Maybe (catMaybes)

Right mcs = mapM (parseOnly fixedCard) hdrCards
hdrMap = Map.fromList (catMaybes mcs)

In [63]:
Map.lookup (Keyword "OBJECT") hdrMap


Just (KeyString "3C273-JET")


In [64]:
-- import qualified Data.Sequence as Seq -- no longer using

In [65]:
-- Represent a binary table
--
-- Could the bitpix be a phantom argument ? Does this even make sense?
--
-- Assumes:
--      XTENSION=BINTABLE, NAXIS=2, PCOUNT=0, GCOUNT=1
--
-- Stores (explicitly or implicitly):
--      BITPIX, NAXIS1, NAXIS2, TFIELDS
--      TTYPE/TFORM/TUNIT<n> values for columns
--
data TableExt = TableExt {
  _teBitPix :: Int
  , _teNRows :: Int
  , _teNCols :: Int
  , _teWidth :: Int   -- width, in bytes, of each row
  , _teCols :: [ColData] -- would be nice to statically enforce size == _teNCols
  } deriving Show

-- This stores the TTYPEn, TFORNm, and TUNITn keywords. Would it be
-- useful to encode the Haskell type for the data in the type?
--
-- At present the description/comment for the field is not included
data ColData = ColData {
  _cdName :: String
  , _cdFormat :: String -- the FITS encoding of the data type
  , _cdUnit :: Maybe String
  } deriving (Eq, Show)

In [66]:
import Control.Monad (forM, unless)
import Data.Either (either)

In [67]:
-- Note that a function is sent in, rather than the map.  
createColumn :: (String -> Either String String) -> Int -> Either String ColData  
createColumn getS c = do
  let cs = show c
  cName <- getS ("TTYPE" ++ cs)
  cForm <- getS ("TFORM" ++ cs)
    
  -- Need to swap from Either to Maybe; note that I do not want to
  -- use cUnit <- getS ... otherwise the whole computation would
  -- fail if no unit was given.
  let cUnit = either (const Nothing) Just (getS ("TUNIT" ++ cs))
  
  Right ColData { _cdName = cName, _cdFormat = cForm, _cdUnit = cUnit }

getKeyValue :: Map.Map Keyword KeyValue -> String -> Either String KeyValue
getKeyValue xs kn = 
  let err = "Key not found: " ++ kn
  in maybe (Left err) Right (Map.lookup (Keyword kn) xs) 

createTable :: Map.Map Keyword KeyValue -> Either String TableExt
createTable m = do
  let get = getKeyValue m
  
      asString _ (KeyString s) = Right s
      asString k v             = Left (k ++ " not a string: " ++ show v)
      
      asInt _ (KeyInt i)       = Right i
      asInt k v                = Left (k ++ " not an int: " ++ show v)
      
      -- extract typed info 
      getS k = get k >>= asString k
      getI k = get k >>= asInt k
      
  ext <- getS "XTENSION"
  naxis <- getI "NAXIS"
  pcount <- getI "PCOUNT"
  gcount <- getI "GCOUNT"
  let conds = [ ext == "BINTABLE"
              , naxis == 2
              , pcount == 0
              , gcount == 1 ]
  unless (and conds) (Left ("invalid conditions: " ++ show conds))
  
  bpix <- getI "BITPIX"
  nbytes <- getI "NAXIS1"
  nrows <- getI "NAXIS2"
  ncols <- getI "TFIELDS"

  -- Here cols is Either String [ColData], where the
  -- string will represent the error from the fist
  -- column which failed.
  --
  cols <- forM [1..ncols] $ createColumn getS 
  let lc = length cols
  unless (lc == ncols) (Left ("Expected " ++ show ncols ++ " columns, found " ++ show lc))
  
  Right TableExt {
         _teBitPix = bpix
         , _teNCols = ncols
         , _teNRows = nrows
         , _teWidth = nbytes
         , _teCols = cols
       }

In [68]:
createTable hdrMap


Right (TableExt {_teBitPix = 8, _teNRows = 1070, _teNCols = 3, _teWidth = 12, _teCols = [ColData {_cdName = "ENERG_LO", _cdFormat = "1E", _cdUnit = Just "keV"},ColData {_cdName = "ENERG_HI", _cdFormat = "1E", _cdUnit = Just "keV"},ColData {_cdName = "SPECRESP", _cdFormat = "1E", _cdUnit = Just "cm**2"}]})

Now, the question is how to use this to parse the data: i.e. create a parser.


In [85]:
wrongMap = Map.insert (Keyword "TFIELDS") (KeyInt 4) hdrMap

In [86]:
Map.lookup (Keyword "TFIELDS") wrongMap


Just (KeyInt 4)

In [87]:
createTable wrongMap


Left "Key not found: TTYPE4"

The original map is unchanged:


In [88]:
Map.lookup (Keyword "TFIELDS") hdrMap


Just (KeyInt 3)

In [ ]: